consistency model
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (8 more...)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (2 more...)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization
With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL.